Chapter 2

Chapter 2.1. Introduction to Game Music History

This chapter will review the various dynamic music approaches throughout the history of game development and identify the advantages and disadvantages of different methods.

Summary

The earliest video game composers in the 1970’s and early 1980’s wrote music by programming audio hardware to generate a limited series of tones in real-time during gameplay. As MIDI became standardized composers were able to use tools to create MIDI files. The MIDI files were compiled into the programming of the game and sent instructions to the player’s on-board sound card to generate sound in real-time alongside gameplay. This method allowed for composers to script music to be generated in a number of different ways depending on the player’s actions. This was the dominant method through the mid-1980’s and into the 1990’s. When the CD-ROM started being used in the 1990’s for storing and playing games, composers were able to record live music and store it on the CDs as AFFs (audio file formats, such as .wav, .mp3, .ogg, etc.). The AFFs were triggered by the game to stream alongside the gameplay resulting in live instrument recordings accompanying the gameplay. As DVDs and digitally downloadable games became more popular, storage limitations became obsolete. This allowed composers to implement more music than was previously possible on CDs, resulting in the potential for more dynamic music.

This summary of dynamic video game music history describes the two fundamental methods that have been used for creating video game music. The first being music that is generated in real-time by sending some form of instructions to a hardware component, the second being the recording of live instruments as AFFs and embedding them within the game to be triggered and streamed back during gameplay. The first will be referred to as “generative music’ (Chapter 2.2), the second as “streaming music” (Chapter 2.3). Note that my use of the term “generative music” is meant as a literal description of how the sound is processed and amplified, and not meant as a way to describe musical notes that are generated by any sort of real-time algorithmic compositional processes. The methods that gained popular use throughout history were those that emphasized specific musical features valued by game developers at the time.

__________________________________________________

Chapter 2.2. 1972-1991: Hardwired Circuitry to CD

2.2.1.
The earliest games with audio amplified electrical signals within the processor’s circuitry to generate simple tones. This was called a piezoelectric speaker, or more commonly referred to as a ‘beeper’ (Collins, 2012, 119). One of the earliest games to use this technique was “Pong” (Atari, 1972). The beeper had severe limitations imposed by the lack of dedicated audio hardware. Alcorn describes the process of generating audio by “poking around” the hardware’s circuitry to find various tones that worked with the game (Shea, 2008). The sounding of various tones in response to the player’s actions illustrates one of the first examples of interactive game audio. One could consider the rhythmic tones sounding as the ball bounces back and forth between the paddles to be the first example of interactive music in games.

The first use of a composed melody in games could be considered to be the mechanical arcades and gambling machines in the early 1900’s, which used recorded music to circumvent gambling restrictions (Collins, 2016). However it was not until 1975 that the first composed melody appeared in a video game. “Gunfight” (Taito, 1975) used the Intel-8080 microprocessor to generate a pitch set.

The music in Gunfight was a short melodic hook from Chopin’s “Funeral March” that played when the player lost. Even though the music was not able to play continuously through the game, the creative decision by developers Taito and Midway Games to include music signaled that music was a desired element for communicating a message to the player. It is apparent in the following years that music continued to be a feature in games. The music in Gunfight demonstrated music being used as an interactive feature: If the player loses, then play the melody.

2.2.2.
“Rally-X” (Namco, 1980) is thought to be the first game with continuous music looping throughout the duration of the gameplay (Fritsch, 2013, 13). This became achievable by using a dedicated audio chip. Dedicated audio chips were developed to allow for audio designers and composers to generate tones on an increased number of channels with a greater number of possible sound waves (Kruse, 2010). “Tempest” (Atari, 1981) used two dedicated audio chips called Pokey chips. Each had four separate audio channels that could simultaneously produce a range of frequencies and allowed for controllable volume (Krap, 2005, Sayer, 1999). Another notable audio chip was the C64’s SID chip (1982), which allowed for a greater range of timbres, filters, and control over the sound’s envelope (attack, sustain, decay, and release) (Fritsch, 2013, 15). While Tempest and many games on the C64 didn’t utilize dedicated audio chips to implement highly interactive music, the range of features on dedicated audio chips allowed for the possibility for music (and more generally sound) to be interactive in a variety of new ways.

“Frogger” (Konami, 1981) was the first game to develop the concept of adaptive music. In Frogger, different stages of gameplay facilitate sudden changes in musical material (Sweet, 2015, 90). Frogger demonstrated that music could act as a continuous feedback mechanism for communicating information to the player about different game states. In the classic arcade version of Frogger, the player controls a frog and must maneuver it past a busy road and over a dangerous river to reach a ‘safe zone’. There are several safe zones just past the dangerous river. When the player successfully reaches a safe zone, the zone becomes filled. A new frog appears at the start and the player must do it again, aiming to successfully fill a different safe zone. When all the safe zones are filled the game resets at a harder difficulty. The cars on the road move faster and the river has more dangerous obstacles.

When the player starts the game continuous background music plays. When the player successfully reaches a safe zone the music jumps to a different part of the soundtrack. When the player dies a unique musical phrase plays and the game restarts from the start. Then transition to a new musical phrase is abrupt. The previous music suddenly stops, there is a brief pause, and then the new phrase of music begins. Having a music system that brings attention to various objective statuses in the game was revolutionary at the time, but has since become an expectation of today’s video game soundtracks.

Guy Whitmore, a contemporary pioneer of adaptive scoring, expresses the industry’s current outlook on the necessity of adaptive music:

“As an audio community, we need to get past the idea of whether or not adaptive music is appropriate or inappropriate for a given genre, or for games in general. If music is called for in an 'interactive' game where specific timings are unknown, then adaptive music is appropriate. End of story” (Brandon, 2002).

2.2.3.
Many of the advancements in chip technology throughout the 1980’s increased the number of possible timbres that instruments could be generated with. When the first 16-bit console was released in 1987, known as the Grafx16, an FM-synthesizer was included which was capable of generating recognizable instrument timbres such as flutes, drums, brass, and strings (Collins, 2008, 41).

Nintendo’s SNES console, released in 1991, was the first to include an SPC-700 chip. The SPC-700 was a digital signal processor that included a synthesizer. The SPC-700 allowed for programmable features like echo, reverb, chorus, time-stretching, compression, equalization, and filtering, and was also the first chip that took advantage of the newly established General MIDI Standard (Collins, 2008, 50). The SPC-700 came stock with instrument presets that were more realistic than those achievable by FM-synthesis, and was also more user-friendly as it allowed composers to convert MIDI data “into files executable by the sound processor…” (Collins, 2008, 46). Many home computers at the time also contained sound chips that utilized the advantages of the MIDI instrument presets.

The increased use of standardized MIDI in video games led to the development of a new type of audio software called iMuse, created by Michael Land and Peter McConnell in 1991. iMuse is a custom music engine that was designed to improve video game music’s ability to adapt to gameplay. iMuse allowed for musical material to be altered in real-time by changing the MIDI data on the fly. Another innovation of iMuse was its ability to have conditional musical transitions that could occur seamlessly (Mendez, 2005; source taken from Collins, 2008). An example of the effectiveness of iMuse can be observed in the game "Monkey Island 2: LeChuck's Revenge" (Lucas Arts, 1991), which utilized many of the aforementioned features. iMuse offered not only a way for music to adapt to player actions, but to adapt in a musical way.

In Monkey Island 2: LeChuck’s Revenge, the player navigates through the game by clicking on locations they want the avatar to move toward. When the player moves to a new location the screen loads a new setting. Music continuously plays throughout the entire game. Each setting has a different soundtrack. Sometimes the soundtrack’s musical differences from one setting to the next is minimal, such as swapping one instrument for another to continue playing the current melody. Other times the soundtrack from one setting to the next is a completely different musical piece. The game music transitions smoothly between these settings by using iMuse. Because iMuse used MIDI data, different instruments could be swapped in realtime to play the MIDI data. All the transitions in Monkey Island seem to be quantized to happen at the end of a musical phrase. This was possible due to iMuse’s ability to keep track of pre-decided eligible transition points (Mendez, 2005; source taken from Collins, 2008).

__________________________________________________

With the advent of CD-ROM technology in 1992 it became possible for composers to record live music and have it play back during specific moments of gameplay (Collins, 2008, 63). This technology changed the way video game music was created and used, as composers began using real instruments instead of MIDI.

__________________________________________________

Chapter 2.3. 1992-Present: Streaming Audio

Up until the early 1990s video game music relied on the programming of audio hardware to generate tones in real-time. While different machines required different programming languages, the fundamental strategy for making (adaptive) music was always the same: program hardware to generate music when something happens in the game. With the arrival of CD-ROM technology many composers abandoned generative audio to use live recordings. This meant that the dynamic music advantages of programs like iMuse were no longer accessible.

Through the 1990’s some consoles and games continued to rely on generative audio techniques, however, the method largely became obsolete in mainstream game development (Collins, 2008, 63). While CDs provided the luxury of using real instruments, it also meant having to figure out how streaming audio could be adaptive. Early streaming audio games like “Myst” (Cyan, 1993) struggled with the challenge. Collins describes Myst’s audio scheme:

“…[B]rief themes would typically loop repetitively until the player changed location, and then would hard cut out to the ambient sounds, illustrating one problem with the rapid jump to Redbook technology: the dynamic MIDI techniques had been abandoned in favor of…linear tracks and loops” (Collins, 2008, 67).

2.3.1.
The successful launch of Sony’s Playstation in 1995 helped establish CD games as a new standard. While the Playstation allowed composers to stream recorded music from the CD, it also had internal hardware that could be used for generating MIDI.

Generated MIDI music was often used for games that prioritized adaptive music over the use of real instrument sounds, such as “Final Fantasy VII” (Squaresoft, 1997). The Nintendo 64 (1996) used cartridges and was only capable of generative audio. Both consoles became common household items and the coexistence of the two illustrates a period of transition between the two technologies. You can find games created in the same years throughout this period using either method; highly adaptive generative MIDI scores, or recorded streaming music with no adaptability whatsoever.

It is hard to pinpoint exactly when streaming music became more interactive. Thousands of CD games were released throughout the 1990s, many having little to no adaptive scoring. Systems like iMuse and DirectMusic, which were initially built to be used with MIDI, had started supporting the use of AFFs. Techniques for creating interactive streaming music had been previously developed for pinball machines like “Indiana Jones Pinball” (1993), which used layered crossfading and transitions based on when the ball crossed different boundary lines (Brian Schmidt, personal correspondence, April 8, 2019), however, those techniques were not immediately integrated into the video game industry.

To understand this period of history more clearly I contacted Brian Schmidt on Facebook, a composer who worked on the front-lines with interactive music through the the 1990's. We discussed how the history of dynamic music developed during this transitional period. Schmidt states:

”I worked on some early PS1 and Saturn games. Music folks were excited to use Redbook audio. Of course the literal Redbook format (which some very early PS1 titles used) functioned like a CD player, so there was no 'interactive' music at all - just ‘play track x’. By the time we started experimenting with music interactivity, game developers started filling the CDs with other game data. The first game I did had over 3000 lines of dialogue and was a PS1 game, so the music had to be MIDI generated. This was because the CDs could only hold a bit over 600 megabytes of audio, and it was all being used for the dialogue” (Brian Schmidt, personal correspondence, April 8, 2019).

Schmidt describes a scenario in which the inhibiting factors for adaptive music were not creative limitations, rather, production restrictions. Working with streaming music, according to Schmidt, was an exciting prospect, but interactive streaming music required more space than most developers were willing to give to the music. My conversation with Schmidt led to a discussion about what the earliest game to successfully use interactive streaming music might have been.

Schmidt states:

“My hunch is that there will not be a single ‘first game’ to do 'interactive music’. Rather, it will look a bit evolutionary with some measure-based transitions in this game, some crossfading in that game, some vertical layering here, some stingers there. Different techniques evolved bit by bit in the early 90’s. This stuff isn't documented very well anywhere. I think a lot is lost to history or corporate secrecy. All the source code and information we had about the interactive music systems for the pinball machines in the early 1990’s are long lost, I'm sure” (Brian Schmidt, personal correspondence, April 8, 2019).

2.3.2.
One of the earliest adaptive music games that had recorded music was “PaRappa the Rapper” (Sony Interactive Entertainment, 1996). The game is a rhythm game where the player must press buttons in the sequence shown to them, and the player must do so on specific beats of the music. The player’s accuracy dictates how different layers of music fade in and out to make the game more or less exciting.

Another of the earliest games to use adaptive streaming music was “Need for Speed 2” (Electronic Arts, 1997). The audio director for the game, Alistair Hirst, explains:

“Need for Speed 2 (1997) had interactive streaming music coming off of the CD (WAV for PC, or VAG format for PSX). I finished programming the music tool (started by Ian Macanulty) called Pathfinder, which allowed a tree of nodes to be built to link up two bar sections of music. That tech was then developed further for the game ‘SSX Tricky’. There were three levels of intensity for each tune, which varied based on your position in the race and your position on the track. In the event of a crash it would jump instantly to the crash music node, which would then lead back to the correct music for that track section and race position” (Alistair Hirst, personal correspondence, April 8, 2019).

As previously descried by Schmidt, Redbook audio did not support any commands to allow music to be interactive. So to get the streaming music to work adaptively in the way that Hirst describes required implementing the audio as .wav or .vag files instead of the Redbook audio file format. With the music being implemented this way it became possible to program it to behave dynamically. Hirst describes the process as “…painfully tedious to implement” (Alistair Hirst, personal correspondence, April 8, 2019). The desire to include this feature despite the difficulty of implementation suggests that the development team believed it was a valuable and worthwhile addition to the game.

2.3.3.
“Grim Fandango” (LucasArts, 1998) was released for PC and took advantage of the new streaming audio features within iMuse, the dynamic music engine created by Land and McDonnell. The soundtrack was composed by Peter McDonnell with the music designed to function like a film score with adaptive abilities (Strank, 2013, 90).

While some games were using custom music engines like Pathfinder or iMuse, many games being released during this time did not bother with interactive music, as composers often felt there was a trade-off. “The key debate that was going on back then was the production quality you could get from pre-rendered music recorded in the studio, vs. the flexibility of stuff rendered on the machine at runtime by using DirectMusic” (Alistair Hirst, personal correspondence, April 8, 2019).

The release of Microsoft’s Xbox in 2001 and Sony’s Playstation 2 increased the popularity of home gaming consoles (Jurkovich, 2018). The two consoles supported a DVD format, which increased the amount of data that a game could contain. This meant that there was more space to fit music stems for adaptive music, a luxury that CDs did not allow. Schmidt states, “In my opinion, the transformative technology for interactive streaming music for games was the DVD” (Brian Schmidt, personal correspondence, April 8, 2019). I responded to his statement by suggesting that perhaps this is why many people think of “Halo: Combat Evolved” (Bungie, 2001) and “SSX Tricky” (Electronic Arts, 2001) as two of the earliest games with dynamic streaming music. Both were released on the Xbox. Schmidt replies, “Yes, those are two of the earliest for sure. And the original XACT (Xbox Audio Creation Tool) had a built-in interactive music system. However, Halo used its own custom audio engine that was tailored to Marty's specific requests” (referring to the composer and audio director for Halo, Marty O’Donnell) (Brian Schmidt, personal correspondence, April 8, 2019).

Halo’s soundtrack was composed by Marty O’Donnell and used a custom music system that allowed music to smoothly transition between different themes depending on the player’s location and actions. In Chapter 4 I conduct a thorough analysis of how the DMS in Halo functions.

SSX Tricky was innovative in how different musical effects would correspond with different player actions as they snowboarded down a slope. The resulting effect was the music sounded as if it was being remixed by a DJ in real-time as the player performed different tricks. Another major feature was how the game would slow down the animation just as the player was landing from a big jump so that the landing would align with the music’s strong beats. SSX Tricky’s soundtrack was comprised of licensed music, but users were encouraged to add their own music to the game by uploading it to their console and creating a playlist. The dynamic remix features worked by utilizing a custom made audio analyzer that constantly scanned the user’s music a few beats ahead of time (Durity, 2013).

2.3.4.
In 2002 Firelight released the audio middleware called FMOD, followed by Audiokenetic’s middleware called Wwise in 2006. These two dynamic audio programs have become standard tools used by developers and game audio professionals. Both audio middleware companies offer a wide range of tutorials and educational support for game developers and composers to learn how to use the software to create dynamic music.

Most game development engines in recent years, such as Unity and Epic’s Unreal Engine, are capable of designing custom dynamic music systems within the game engine. However, the standardization of adaptive streaming music techniques can be partially attributed to the development of middleware, as they prioritized accessibility, ease of use, and education. For my prototype in Chapter 5 I used Wwise with additional programming done in Unity to enhance the music’s ability to follow a logic system that references previously triggered musical segments.

__________________________________________________

Chapter 2.4. Contemporary Games and Their Music

This sub-chapter is designed to provide examples of how contemporary games look, sound, and play. The games below are from various genres with different types of gameplay. Within each game the music is used to achieve different goals.

2.4.1. - Skyrim
“The Elder Scrolls V: Skyrim” (Bethesda, 2011) is an open world single player role playing game (RPG). When the player is not actively progressing a story arc they are free to explore the vast open world, which could take hundreds of hours to uncover. Through exploration the player may discover characters and locations that shape their narrative journey. Jessica Curry describes the challenges of music’s role in Skyrim by saying:

“People are putting in hundreds of hours into [video game] campaigns. Something like Skyrim, my son clocked in something like 230 hours of gameplay. And it is a totally different mindset in terms of the way you are composing as well — it is so complicated. Because you’re not just creating music for a linear experience, you have to think about player agency, they can choose to go anywhere in the game so the music has to be able to cope with those changes” (Curry, 2018).

Skyrim takes place in a fantasy world with elves, vikings, dragons, and other mythical creatures. Horses are a primary mode of transportation, weapons are limited to bows, axes, maces, and swords, and the mythology and viking imagery throughout the game is reminiscent of 1st century Europe. While some folk melodies and instruments are used to reinforce the associated time and setting, large orchestras and choirs dominate the soundtrack with a pseudo-romantic musical style.

Different musical tracks and melodies are associated with various locations in the game. As the player navigates from one place to a next the music changes. This is done throughout the game by fading tracks in and out from silence, crossfading between tracks, or stopping and starting a new track during a load screen.

Music also changes when the player transitions to and from combat. This is achieved in the game by fading in a combat theme while simultaneously fading out the existing music. There are several different musical cues that can be used for each situation and they are often queued at random to lessen the amount of repetition.

2.4.2. - No Man’s Sky
The dynamic music in the game “No Man's Sky” (Hello Games, 2016) uses a highly complex DMS that was built by audio director Paul Weir. Weir developed an audio program called Pulse which combines hundreds of different musical segments to create soundscapes from the music's alt-rock soundtrack. Paul Weir states:

“It’s really a way of organizing lots of small, granular bits of sound [and] music components, and imposing a bit of structure and logic and control on top of them...We can define how they’re combined when they’re played in the game, and attach them to game parameters so that, depending on either what you’re doing or where you are in the game, it could be calling different bits of music. It’s combining individual small elements, so the exact mix of what you’re getting and how that plays will be different every time you play” (Epstein, 2016).

Pulse's highly complex approach allows the music to be assembled in a granular and procedural way. The player hears the same material that makes up the songs every time they play the game, but how the songs are assembled and connected each time is different. The world of No Man's Sky is procedurally generated and the player’s exploration of these worlds is a significant part of the gameplay. The resulting music from the Pulse engine creates an endless musical soundscape that accompanies the player’s exploration.

2.4.3. - Overwatch
“Overwatch” (Blizzard, 2016) is competitive online first person shooter game (FPS). Unlike the other games I have mentioned, Overwatch is not story-driven. The player competes online against other players in different team-based combat scenarios. Each scenario is time based and during that time an objective must be achieved.

The setting of Overwatch is a fictional future where a multitude of heroes from different factions are being called on to combat against each other. The soundtrack is comprised of electronic and orchestral instruments, with an emphasis on brass fanfares to create heroic sounding themes.

Each arena that the players compete in have their own music to match the setting. The arena that takes place in Japan has a pentatonic melody. The arena that is set in the American Southwest has a country western melody, (and so forth). The music plays to introduce the arena, and then does not play again for the majority of the match. The only other time the music plays is when your team, or the other team, is either nearing completion of the objective, or running out of time to complete the objective. This ‘objective’ music is used to let the players know that the match is nearly over, reinforcing the feeling of urgency to complete their objective in time.

2.4.4. - Sekiro
“Sekiro: Shadows Die Twice” (FromSoftware, 2019) is a single player narrative-driven game that takes place in a mythological ancient Japan. The player’s avatar is a ninja who is tasked with protecting a young boy called the Divine Heir. Throughout the game the player explores the Japanese setting while defeating enemies to progress. Different areas and bosses have different musical themes. Exploration music is usually less tense than the music that is played when the player is in combat. Boss fights often have multiple stages, each stage with different music.

The music for Sekiro is appropriated by the setting. Traditional Japanese instruments are mixed with a western orchestra. The music ranges from tense string textures to Japanese folk melodies.

2.4.5. - Conclusion
The description of how music is used in the games Skyrim, No Man’s Sky, Overwatch, and Sekiro, may be helpful to readers who are not familiar with how music is used in modern games. Chapter 3 looks more specifically at various techniques that composers use to make music adaptive by providing visual examples and playable demonstrations.

__________________________________________________

Previous: Chapter 1: Introduction
Next: Chapter 3: Contemporary Techniques
Contents: Back to Table of Contents